-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast-import: disallow "." and ".." path components #1831
base: master
Are you sure you want to change the base?
Conversation
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Eric Sunshine wrote (reply to this): On Mon, Nov 25, 2024 at 12:58 PM Elijah Newren via GitGitGadget
<[email protected]> wrote:
> If a user specified e.g.
> M 100644 :1 ../some-file
> then fast-import previously would happily create a git history where
> there is a tree in the top-level directory named "..", and with a file
> inside that directory named "some-file". The top-level ".." directory
> causes problems. While git checkout will die with errors and fsck will
> report hasDotdot problems, the user is going to have problems trying to
> remove the problematic file. Simply avoid creating this bad history in
> the first place.
>
> Signed-off-by: Elijah Newren <[email protected]>
> ---
> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> @@ -1466,6 +1466,9 @@ static int tree_content_set(
> e->name = to_atom(p, n);
> + if (!strcmp(e->name->str_dat, ".") || !strcmp(e->name->str_dat, "..")) {
> + die("path %s contains invalid component", p);
> + }
Probably not worth a reroll, but is_dot_or_dotdot() might be usable here.
(And -- style nit -- the braces could be dropped.) |
User |
If a user specified e.g. M 100644 :1 ../some-file then fast-import previously would happily create a git history where there is a tree in the top-level directory named "..", and with a file inside that directory named "some-file". The top-level ".." directory causes problems. While git checkout will die with errors and fsck will report hasDotdot problems, the user is going to have problems trying to remove the problematic file. Simply avoid creating this bad history in the first place. Signed-off-by: Elijah Newren <[email protected]>
86ea3df
to
447b679
Compare
On the Git mailing list, Elijah Newren wrote (reply to this): On Mon, Nov 25, 2024 at 10:15 AM Eric Sunshine <[email protected]> wrote:
>
> On Mon, Nov 25, 2024 at 12:58 PM Elijah Newren via GitGitGadget
> <[email protected]> wrote:
> > If a user specified e.g.
> > M 100644 :1 ../some-file
> > then fast-import previously would happily create a git history where
> > there is a tree in the top-level directory named "..", and with a file
> > inside that directory named "some-file". The top-level ".." directory
> > causes problems. While git checkout will die with errors and fsck will
> > report hasDotdot problems, the user is going to have problems trying to
> > remove the problematic file. Simply avoid creating this bad history in
> > the first place.
> >
> > Signed-off-by: Elijah Newren <[email protected]>
> > ---
> > diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> > @@ -1466,6 +1466,9 @@ static int tree_content_set(
> > e->name = to_atom(p, n);
> > + if (!strcmp(e->name->str_dat, ".") || !strcmp(e->name->str_dat, "..")) {
> > + die("path %s contains invalid component", p);
> > + }
>
> Probably not worth a reroll, but is_dot_or_dotdot() might be usable here.
>
> (And -- style nit -- the braces could be dropped.)
Good catches, thanks. I think they are worth a reroll; I'll send one in. |
User |
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
This patch series was integrated into seen via git@0a88e9e. |
On the Git mailing list, Patrick Steinhardt wrote (reply to this): On Mon, Nov 25, 2024 at 07:00:48PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <[email protected]>
>
> If a user specified e.g.
> M 100644 :1 ../some-file
> then fast-import previously would happily create a git history where
> there is a tree in the top-level directory named "..", and with a file
> inside that directory named "some-file". The top-level ".." directory
> causes problems. While git checkout will die with errors and fsck will
> report hasDotdot problems, the user is going to have problems trying to
> remove the problematic file. Simply avoid creating this bad history in
> the first place.
Makes sense.
More generally this made me wonder whether we should maybe extract some
bits out of "fsck.c" so that we don't have to duplicate the checks done
there in git-fast-import(1). This would for example include checks for
".git" and its HFS/NTFS variants as well as tree entry length checks for
names longer than 4096 characters.
This of course does not have to be part of your patch, which looks good
to me.
Thanks!
Patrick |
User |
This patch series was integrated into seen via git@7ccbb69. |
This patch series was integrated into next via git@8b145bb. |
On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this): Hi. I see that this is in `next` now so the following might
be irrelevant.
On Mon, Nov 25, 2024, at 20:00, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <[email protected]>
> [...]
> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> index 76d5c20f141..995ef76f9d6 100644
> --- a/builtin/fast-import.c
> +++ b/builtin/fast-import.c
> @@ -1466,6 +1466,8 @@ static int tree_content_set(
> root->tree = t = grow_tree_content(t, t->entry_count);
> e = new_tree_entry();
> e->name = to_atom(p, n);
> + if (is_dot_or_dotdot(e->name->str_dat))
> + die("path %s contains invalid component", p);
Nit: single-quoting the path seems more common:
$ git grep "\"path '%s'" ':!po/' | wc -l
17
$ git grep "\"path %s" ':!po/' | wc -l
4
> e->versions[0].mode = 0;
> oidclr(&e->versions[0].oid, the_repository->hash_algo);
> t->entries[t->entry_count++] = e;
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 6224f54d4d2..caf3dc003a0 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -522,6 +522,26 @@ test_expect_success 'B: fail on invalid committer (5)' '
> test_must_fail git fast-import <input
> '
>
> +test_expect_success 'B: fail on invalid file path' '
> + cat >input <<-INPUT_END &&
> + blob
> + mark :1
> + data <<EOF
> + File contents
> + EOF
> +
> + commit refs/heads/badpath
> + committer Name <email> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + Commit Message
> + COMMIT
> + M 100644 :1 ../invalid-path
Maybe the test could be parameterized so that both `..` and `.` can
be tested? Like in `test_path_eol_success`.
--
Kristoffer Haugsbakk
|
User |
On the Git mailing list, Junio C Hamano wrote (reply to this): "Kristoffer Haugsbakk" <[email protected]> writes:
>> + if (is_dot_or_dotdot(e->name->str_dat))
>> + die("path %s contains invalid component", p);
>
> Nit: single-quoting the path seems more common:
>
> $ git grep "\"path '%s'" ':!po/' | wc -l
> 17
> $ git grep "\"path %s" ':!po/' | wc -l
> 4
Ah, I missed that one. Thanks for catching.
We probably should write it down.
--- >8 ---
[PATCH] CodingGuidelines: a handful of error message guidelines
It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.
Let's write down established best practice we are aware of.
Signed-off-by: Junio C Hamano <[email protected]>
---
* I am writing what I think is the established practice from
memory; clarifications, corrections, and additions are all
welcome.
Documentation/CodingGuidelines | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
index 87904791cb..0444391983 100644
--- c/Documentation/CodingGuidelines
+++ w/Documentation/CodingGuidelines
@@ -703,16 +703,22 @@ Program Output
Error Messages
- - Do not end error messages with a full stop.
+ - Do not end a single-sentence error message with a full stop.
- Do not capitalize the first word, only because it is the first word
- in the message ("unable to open %s", not "Unable to open %s"). But
+ in the message ("unable to open '%s'", not "Unable to open '%s'"). But
"SHA-3 not supported" is fine, because the reason the first word is
capitalized is not because it is at the beginning of the sentence,
but because the word would be spelled in capital letters even when
it appeared in the middle of the sentence.
- - Say what the error is first ("cannot open %s", not "%s: cannot open")
+ - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
+
+ - Enclose the subject of an error inside a pair of single quotes,
+ e.g. `die(_("unable to open '%s'"), path)`.
+
+ - Unless there is a compelling reason not to, error messages should
+ be marked for `_("translation")`.
Externally Visible Names |
On the Git mailing list, Jeff King wrote (reply to this): On Tue, Nov 26, 2024 at 07:57:57AM +0100, Patrick Steinhardt wrote:
> On Mon, Nov 25, 2024 at 07:00:48PM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <[email protected]>
> >
> > If a user specified e.g.
> > M 100644 :1 ../some-file
> > then fast-import previously would happily create a git history where
> > there is a tree in the top-level directory named "..", and with a file
> > inside that directory named "some-file". The top-level ".." directory
> > causes problems. While git checkout will die with errors and fsck will
> > report hasDotdot problems, the user is going to have problems trying to
> > remove the problematic file. Simply avoid creating this bad history in
> > the first place.
>
> Makes sense.
>
> More generally this made me wonder whether we should maybe extract some
> bits out of "fsck.c" so that we don't have to duplicate the checks done
> there in git-fast-import(1). This would for example include checks for
> ".git" and its HFS/NTFS variants as well as tree entry length checks for
> names longer than 4096 characters.
I had the same thought, but I think the right code to be using is
verify_path(). That's what ultimately is used to let names into the
index from trees, from update-index, or from other tools like git-apply.
So I'd consider that authoritative, and fsck is mostly trying to follow
those rules while looking at only a single tree at a time. But
fast-import should have the whole path as a string, just like the index
code does).
-Peff |
User |
On the Git mailing list, Eric Sunshine wrote (reply to this): On Wed, Nov 27, 2024 at 8:23 AM Junio C Hamano <[email protected]> wrote:
> [PATCH] CodingGuidelines: a handful of error message guidelines
>
> It is more efficient to have something in the coding guidelines
> document to point at, when we want to review and comment on a new
> message in the codebase to make sure it "fits" in the set of
> existing messages.
>
> Let's write down established best practice we are aware of.
>
> Signed-off-by: Junio C Hamano <[email protected]>
> ---
> diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
> @@ -703,16 +703,22 @@ Program Output
> Error Messages
>
> - - Do not end error messages with a full stop.
> + - Do not end a single-sentence error message with a full stop.
>
> - Do not capitalize the first word, only because it is the first word
> - in the message ("unable to open %s", not "Unable to open %s"). But
> + in the message ("unable to open '%s'", not "Unable to open '%s'"). But
> "SHA-3 not supported" is fine, because the reason the first word is
> capitalized is not because it is at the beginning of the sentence,
> but because the word would be spelled in capital letters even when
> it appeared in the middle of the sentence.
>
> - - Say what the error is first ("cannot open %s", not "%s: cannot open")
> + - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
> +
> + - Enclose the subject of an error inside a pair of single quotes,
> + e.g. `die(_("unable to open '%s'"), path)`.
These changes all seem fine.
> + - Unless there is a compelling reason not to, error messages should
> + be marked for `_("translation")`.
We might want to spell this out more fully, such as stating that
messages from porcelain commands should be marked for translation, but
messages in plumbing should not. Also, perhaps mention explicitly that
BUG("message") should not be marked for translation since they are
intended to be read by Git developers, not by end-users. |
On the Git mailing list, Junio C Hamano wrote (reply to this): Jeff King <[email protected]> writes:
> I had the same thought, but I think the right code to be using is
> verify_path(). That's what ultimately is used to let names into the
> index from trees, from update-index, or from other tools like git-apply.
Yeah, I agree that is the right helper to use. |
On the Git mailing list, Junio C Hamano wrote (reply to this): Taking input from comments by Eric (thanks) on the previous round,
this iteration adds a bit more about Porcelain/Plumbing and BUG().
diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 71e4742fd5..2b8f99f333 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -703,8 +703,15 @@ Error Messages
- Enclose the subject of an error inside a pair of single quotes,
e.g. `die(_("unable to open '%s'"), path)`.
- - Unless there is a compelling reason not to, error messages should
- be marked for `_("translation")`.
+ - Unless there is a compelling reason not to, error messages from the
+ Porcelain command should be marked for `_("translation")`.
+
+ - Error messages from the plumbing commands are sometimes meant for
+ machine consumption and should not be marked for `_("translation")`
+ to keep them 'grep'-able.
+
+ - BUG("message") are for communicating the specific error to
+ developers, and not to be translated.
Externally Visible Names
--- >8 ---
It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.
Let's write down established best practice we are aware of.
Helped-by: Eric Sunshine <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
---
Documentation/CodingGuidelines | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 3263245b03..2b8f99f333 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -689,16 +689,29 @@ Program Output
Error Messages
- - Do not end error messages with a full stop.
+ - Do not end a single-sentence error message with a full stop.
- Do not capitalize the first word, only because it is the first word
- in the message ("unable to open %s", not "Unable to open %s"). But
+ in the message ("unable to open '%s'", not "Unable to open '%s'"). But
"SHA-3 not supported" is fine, because the reason the first word is
capitalized is not because it is at the beginning of the sentence,
but because the word would be spelled in capital letters even when
it appeared in the middle of the sentence.
- - Say what the error is first ("cannot open %s", not "%s: cannot open")
+ - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
+
+ - Enclose the subject of an error inside a pair of single quotes,
+ e.g. `die(_("unable to open '%s'"), path)`.
+
+ - Unless there is a compelling reason not to, error messages from the
+ Porcelain command should be marked for `_("translation")`.
+
+ - Error messages from the plumbing commands are sometimes meant for
+ machine consumption and should not be marked for `_("translation")`
+ to keep them 'grep'-able.
+
+ - BUG("message") are for communicating the specific error to
+ developers, and not to be translated.
Externally Visible Names
--
2.47.1-499-g8536fed62d
|
This branch is now known as |
This patch series was integrated into seen via git@66d1ef3. |
This patch series was integrated into seen via git@abced81. |
On the Git mailing list, Eric Sunshine wrote (reply to this): On Wed, Nov 27, 2024 at 7:36 PM Junio C Hamano <[email protected]> wrote:
> It is more efficient to have something in the coding guidelines
> document to point at, when we want to review and comment on a new
> message in the codebase to make sure it "fits" in the set of
> existing messages.
>
> Let's write down established best practice we are aware of.
>
> Helped-by: Eric Sunshine <[email protected]>
> Signed-off-by: Junio C Hamano <[email protected]>
> ---
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> @@ -689,16 +689,29 @@ Program Output
> Error Messages
>
> - - Say what the error is first ("cannot open %s", not "%s: cannot open")
> + - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
> +
> + - Enclose the subject of an error inside a pair of single quotes,
> + e.g. `die(_("unable to open '%s'"), path)`.
> +
> + - Unless there is a compelling reason not to, error messages from the
> + Porcelain command should be marked for `_("translation")`.
Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.
> + - Error messages from the plumbing commands are sometimes meant for
> + machine consumption and should not be marked for `_("translation")`
> + to keep them 'grep'-able.
Using the same example, `_("translation")`, for both the "should be"
and "should not be" cases may very well confuse readers. (It certainly
confused me.) Perhaps mirroring the example of an item earlier in the
list would be clearer:
- Unless there is a compelling reason not to, error messages from
porcelain commands should be marked for translation, e.g.
`die(_("bad revision"))`
- Error messages from plumbing commands are sometimes meant for
machine consumption, thus should not be marked for translation,
e.g. `die("bad revision")`
> + - BUG("message") are for communicating the specific error to
> + developers, and not to be translated.
Okay, although could be slightly more explicit:
- BUG("message") is for communicating a specific failure to
developers, not end-users, thus should not be translated. |
On the Git mailing list, Junio C Hamano wrote (reply to this): Eric Sunshine <[email protected]> writes:
>> + - Unless there is a compelling reason not to, error messages from the
>> + Porcelain command should be marked for `_("translation")`.
>
> Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.
;-) I think that is how we spell them in our documentation when we
contrast them against each other.
>> + - Error messages from the plumbing commands are sometimes meant for
>> + machine consumption and should not be marked for `_("translation")`
>> + to keep them 'grep'-able.
>
> Using the same example, `_("translation")`, for both the "should be"
> and "should not be" cases may very well confuse readers. (It certainly
> confused me.) Perhaps mirroring the example of an item earlier in the
> list would be clearer:
>
> - Unless there is a compelling reason not to, error messages from
> porcelain commands should be marked for translation, e.g.
> `die(_("bad revision"))`
>
> - Error messages from plumbing commands are sometimes meant for
> machine consumption, thus should not be marked for translation,
> e.g. `die("bad revision")`
Thanks, that is much better. Let me steal it verbatim in the
hopefully final reroll.
>> + - BUG("message") are for communicating the specific error to
>> + developers, and not to be translated.
>
> Okay, although could be slightly more explicit:
>
> - BUG("message") is for communicating a specific failure to
> developers, not end-users, thus should not be translated.
The way I read your rewrite is that the "communitation" mentioned is
between the program and the user who saw the message. I wanted to
say that the message is seen first by an end-user, and then is
communicated to developers. And not translating is one way to make
sure the message is not mangled, and stays grep-able, during the
game of telephone.
Would this work better?
- In order to help the user who saw BUG("message") to accurately
communicate it to developers, do not mark them for translation.
Thanks. |
On the Git mailing list, Eric Sunshine wrote (reply to this): On Thu, Nov 28, 2024 at 4:28 AM Junio C Hamano <[email protected]> wrote:
> Eric Sunshine <[email protected]> writes:
> >> + Porcelain command should be marked for `_("translation")`.
> >
> > Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.
>
> ;-) I think that is how we spell them in our documentation when we
> contrast them against each other.
I must not have been paying close enough attention.
> >> + - BUG("message") are for communicating the specific error to
> >> + developers, and not to be translated.
> >
> > Okay, although could be slightly more explicit:
> >
> > - BUG("message") is for communicating a specific failure to
> > developers, not end-users, thus should not be translated.
>
> The way I read your rewrite is that the "communitation" mentioned is
> between the program and the user who saw the message. I wanted to
> say that the message is seen first by an end-user, and then is
> communicated to developers. And not translating is one way to make
> sure the message is not mangled, and stays grep-able, during the
> game of telephone.
>
> Would this work better?
>
> - In order to help the user who saw BUG("message") to accurately
> communicate it to developers, do not mark them for translation.
Let's not spend too much time fine-tuning this. I found your original
clearer than this rewrite. It was just the "and not to be" bit that
made my reading hiccup. Taking your original but substituting in
"thus" may help:
- BUG("message") are for communicating the specific error to
developers, thus should not be translated. |
Changes since v1:
cc: Eric Sunshine [email protected]
cc: Patrick Steinhardt [email protected]
cc: "Kristoffer Haugsbakk" [email protected]
cc: Jeff King [email protected]